Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use iconv.decodeStream instead of iconv.decode; Fixes #374 #375

Merged
merged 6 commits into from
Apr 7, 2022
Merged

Use iconv.decodeStream instead of iconv.decode; Fixes #374 #375

merged 6 commits into from
Apr 7, 2022

Conversation

yetzt
Copy link
Contributor

@yetzt yetzt commented Sep 8, 2021

Use iconv.decodeStream instead of iconv.decode; Fixes #374

@tomas
Copy link
Owner

tomas commented Sep 8, 2021

Thanks for the PR! IIRC a long time ago I tried using iconv's stream interface but wasn't able to for some reason.

Are you sure this won't break anything?

@tomas
Copy link
Owner

tomas commented Sep 8, 2021

Also, does the test suite pass 100% on your end?

@yetzt
Copy link
Contributor Author

yetzt commented Sep 9, 2021

i'd like to refactor StreamDecoder and maybe write some extra tests, but that could take a while to get me to. in the mean time, the easiest hotfix would be not to decode the stream if its unnessecary (from utf-8 to utf-8) (but it would probably still be broken for other multibyte layouts).

i haven't run tests yet.

@yetzt
Copy link
Contributor Author

yetzt commented Sep 9, 2021

the tests from decoder_spec.js worked out fine, i added a minimal test for the bug.

@yetzt
Copy link
Contributor Author

yetzt commented Sep 9, 2021

i extended the tests to include euc-jp and gb18030, omitted decoding if the charset is already utf-8 and ensured the charset found in the magic charset detection is one iconv can actually handle. (more elegant code would be possible without the magic inline charset detection though)

@tomas
Copy link
Owner

tomas commented Sep 9, 2021

Beautiful. Will take a look when I have a minute.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Bug: UTF-8 characters across chunk boundaries get corrupted
2 participants